Goto

Collaborating Authors

 non-convex metric learning


Generalization Bound of Gradient Descent for Non-Convex Metric Learning

Neural Information Processing Systems

Metric learning aims to learn a distance measure that can benefit distance-based methods such as the nearest neighbour (NN) classifier. While considerable efforts have been made to improve its empirical performance and analyze its generalization ability by focusing on the data structure and model complexity, an unresolved question is how choices of algorithmic parameters, such as the number of training iterations, affect metric learning as it is typically formulated as an optimization problem and nowadays more often as a non-convex problem. In this paper, we theoretically address this question and prove the agnostic Probably Approximately Correct (PAC) learnability for metric learning algorithms with non-convex objective functions optimized via gradient descent (GD); in particular, our theoretical guarantee takes the iteration number into account. We first show that the generalization PAC bound is a sufficient condition for agnostic PAC learnability and this bound can be obtained by ensuring the uniform convergence on a densely concentrated subset of the parameter space. We then show that, for classifiers optimized via GD, their generalizability can be guaranteed if the classifier and loss function are both Lipschitz smooth, and further improved by using fewer iterations. To illustrate and exploit the theoretical findings, we finally propose a novel metric learning method called Smooth Metric and representative Instance LEarning (SMILE), designed to satisfy the Lipschitz smoothness property and learned via GD with an early stopping mechanism for better discriminability and less computational cost of NN.


Review for NeurIPS paper: Generalization Bound of Gradient Descent for Non-Convex Metric Learning

Neural Information Processing Systems

The SMILE algorithm is basically the Nadaraya-Watson estimator (with the Gaussian kernel with the Mahalanobis metric instead of the Euclidean metric) where the support vectors are also learnt instead of using training points as support vectors. It is not clear how much advantage does the SMILE classifier get by learning the representative instances. I suspect that they may account for a lot of the advantage SMILE has over other algorithms, since the SMILE classifier is the very simple Nadaraya-Watson estimator, as pointed out above. Moreover, the other competitor algorithms were not offered a chance to similarly learn nice prototypes. One way to rebut this criticism would be to run SMILE but restrict it to using a subset of training points or else award all other methods e.g. It would have been nice if some contemporary applications with VAE or deep metric learning could have been explored.


Review for NeurIPS paper: Generalization Bound of Gradient Descent for Non-Convex Metric Learning

Neural Information Processing Systems

The four referees support acceptance for the contribution and I also recommend acceptance. However, please consider revising your paper in order to address the lack of novelty of Lemma 1, to add the generalization bound of SMILE and to precise the fact that your result is rather general and not purely specific to metric learning.


Generalization Bound of Gradient Descent for Non-Convex Metric Learning

Neural Information Processing Systems

Metric learning aims to learn a distance measure that can benefit distance-based methods such as the nearest neighbour (NN) classifier. While considerable efforts have been made to improve its empirical performance and analyze its generalization ability by focusing on the data structure and model complexity, an unresolved question is how choices of algorithmic parameters, such as the number of training iterations, affect metric learning as it is typically formulated as an optimization problem and nowadays more often as a non-convex problem. In this paper, we theoretically address this question and prove the agnostic Probably Approximately Correct (PAC) learnability for metric learning algorithms with non-convex objective functions optimized via gradient descent (GD); in particular, our theoretical guarantee takes the iteration number into account. We first show that the generalization PAC bound is a sufficient condition for agnostic PAC learnability and this bound can be obtained by ensuring the uniform convergence on a densely concentrated subset of the parameter space. We then show that, for classifiers optimized via GD, their generalizability can be guaranteed if the classifier and loss function are both Lipschitz smooth, and further improved by using fewer iterations.